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SUMMARY 


1.  Technical  Problem 


The  task  is  to  carry  out  the  final  development  of  a  com¬ 
puter-based  system  for  automated  instruction  of  the  new  speech 
sounds  of  second  languages,  and  to  field-test  this  system  for 
two  language  pairs:  English  speakers  learning  Mandarin  Chinese, 
and  Spanish  speakers  learning  English. 

1 

2 .  General  Methodology 

Laboratory  experiments  and  field  evaluations. 

3.  Technical  Results 


This  report  describes  the  first  evaluation  experiment  of 
the  Mark  II  ‘model  of  the  Automated  Pronunciation  Instructor 
(API)  system.  Two  matched  groups  of  students  of  Elementary 
Mandarin  Chinese,  enrolled  at  two  local  universities,  were 
studied,  p ne  group  was  tested  and  trained  with  the  API  system; 
the  other  was  simply  tested  within  the  same  time  frame. 
Significant  treatment  effects  were  observed. 

4 .  Department  of  Defense  Implications 

Language  schools  of  the  Department  of  Defense  give  instruc¬ 
tion  in  approximately  65  languages  to  over  200,000  students  each 
year.  The  systems  under  development  are  designed  to  facilitate 
this  '  ins tructional  process. 
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PREFACE 


The  presen;  contract  is  a  partial  continuation  of  a  research 
program  begun  in  1966  under  ARPA  sponsorship.  Of  the  four  tasks 
at  one  time  funded  under  AFOSR  Contract  F44620-67-C-0033,  the 
present  task  remains  active  under  Contract  F4  4620  -  7 1 -C -0065 .  This 
technical  report  covers  the  period  extending  through  31  December 
1973,  and  is  devoted  to  a  description  of  experimental  activities 
completed  earlier  in  that  calendar  year.  It  completes  the  descrip¬ 
tion  of  the  first  phase  of  the  final  testing  of  the  Automated 
Pronunciation  Instructor  (API)  system,  in  one  of  two  language  pairs: 
English  speakers  learning  Mandarin  Chinese  pronunciation.  The  second 
evaluation,  currently  proceeding  on  schedule  at  the  University  of 
Miami,  Coral  Gables,  Florida,  is  much  more  extensive.  It  involves 
Spanish  speakers  learning  English  pronunciation.  That  field  test 
will  be  the  subject  of  future  reports. 
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1.  INTRODUCTION 

The  purpose  of  the  present  experiment  is  the  evaluation  of 
the  effectiveness  of  the  Automated  Pronunciation  Instructor  (API) 
system  in  the  modification  of  the  speech  of  English-speaking 
stuaents  of  Mandarin  Chinese.  The  design  concepts  of  the  API 
have  been  detailed  in  previous  technical  reports,  but  a  brief 
sketch  of  the  system  and  its  operation  in  the  context  of  the 
English-Chinese  language  pair  is  presented  here  as  a  prelude  to 
the  description  of  the  experiment  undertaken. 

1.1  Background 

The  central  problem  to  which  the  API  system  addresses  it¬ 
self  is  that  students  of  new  languages  bring  to  their  effort 
certain  pronunciation  handicaps  forced  on  them  by  their  over¬ 
learned  skill  in  their  "mother  tongue."  The  distinguishing 
factor  of  the  API  approach  is  its  production  of  visual  as  well 
as  auditory  correlates  of  the  utterances  of  both  student  and 
teacher.  By  intelligent  and  interactive  use  of  this  double¬ 
modality  feedback,  the  student's  pronunciation  may  be  improved 
in  a  manner  unavailable  to  the  student  using  audio  feedback 
alone.  The  relative  inefficiency  of  the  audio  channel  arises 
because  of  the  nature  of  the  second- language  learning  task: 
certain  sound  distinctions  that  are  phonemic  in  the  target 
language  are,  by  coincidence,  not  present  or  open  to  free  varia¬ 
tion  in  the  source  language;  and  the  resultant  inability  of  the 
student  to  perceive  or  to  produce  those  distinctions  both  defines 
and  circumscribes  the  parameters  of  his  accent. 

The  API  system  deals  with  this  problem  by  concentrating 
the  efforts  of  the  student  within  those  sound  distinctions 
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known  by  contrastive  language  analysis  to  be  major  contributors 
to  the  overall  accent  he  exhibits.  The  predictability  and  gener¬ 
ality  of  the  problems  across  many  students  of  similar  background 
and  target- language  objective  (referred  to  as  students  of  a  given 
"language  pair")  makes  possible  a  group  approach.  At  present, 
technical  constraints  have  resulted  in  a  system  that  handles  but 
one  student  at  a  time,  but  expansion  to  a  mul t i -s ta t ion  configura¬ 
tion  is  a  feasible  later  goal,  if  warranted.  The  evaluation 
experiments  reported  here  have  been  carried  out  with  groups  of 
students  using  the  API  system  on  a  s tagge red- s chedul e  basis. 

1.2  A  Brief  Sketch  of  the  API  System 

The  API  system  is  built  around  a  minicomputer  (Digital  Equip¬ 
ment  Corporation  PDP-8e)  which  the  student  controls  by  means  of  a 
few  pushbuttons.  It  is  actually  easier  for  the  student  to  manipulate 
this  system  than  the  equipment  in  a  conventional  language  laboratory 
with  facilities  for  recording  and  playback  of  student  and  teacher 
speech.  The  API  contains  those  features  and  adds  to  them  a  real¬ 
time  visual  analysis  of  certain  aspects  of  his  speech. 

The  visual  display  is  produced  in  such  a  way  as  to  accentuate 
the  expected  differences  between  his  and  the  teacher's  rendition 
of  a  selected  set  of  training  utterances.  Through  an  understanding 
of  the  relationships  between  visual  display  and  the  manner  of 
articulation,  the  student  is  guided  towards  articulatory  gestures 
more  closely  approximating  the  teacher's. 
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The  student  wears  a  headband-mounted  microphone,  positioned 
close  to  the  mouth  but  out  of  the  breath  stream,  lie  also  wears  a 
miniature  accelerometer,  fastened  to  the  throat  with  thin  double- 
surfaced  adhesive  tape  This  transducer  picks  up  the  fundamental 
frequency,  or  "tone,"  of  the  voice  (i.e.,  the  rate  at  which  ♦’he 
vocal  cord;  are  vibrating  during  voiced  portions  of  speech).  The 
microphone- acce 1 erometer  assembly  is  comfortable  for  the  student, 
who  quickly  forgets  its  presence  and  concentrates  on  the  task  at 
hand. 


The  student  receives  feedback  from  a  large  display  oscillo¬ 
scope  and  a  high  fidelity  loudspeaker.  The  computer  draws  pictures 
on  the  screen  while  performing  its  other  chores  of  controlling  data 
input,  storage  and  the  rest  of  the  equipment  of  the  system. 
Descriptions'  of  the  displays  themselves  will  be  given  below  in  the 
context  of  the  curriculum. 

The  student  informs  the  system  which  of  several  operations 
he  wishes  to  perform  through  the  use  of  pushbuttons  recessed 
within  his  work  table.  There  are  buttons  for  recording,  playback, 
display  manipulation,  new  training  utterances,  and  other  utility 
functions . 

At  no  time  during  the  operation  of  the  system  does  the  equip- 
mer*  ever  make  an  evaluation  of  the  adequacy  of  the  pronunciation 
of  the  student.  That  is  left  to  the  student,  on  the  hypothesis 
that  the  additional  information  provided  by  the  visual  analysis, 
in  conjunction  with  the  audio  replay,  will  suffice  to  bring  the 
student's  abilities  as  a  pattern  recognizer  into  play. 
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1.3  Phonological  Contrasts  in  the  Engl i sh-Mandarin  Chinese 

Language  Pair 

Two  major  pronunciation  problems  in  this  language  pair  were 
chosen  for  experimentation:  the  production  of  "tones"  and  the 
production  of  aspirate  and  unaspirate  voiceless  initial  stops. 

Any  isolated  syllable  in  Chinese  (a  sequence  of  optional 
consonant  and  final  vowel  or  vow'.ls)  can  be  pronounced  with  one 
of  four  tones,  or  movements  of  the  fundamental  frequency  over 
the  voiced  portion.  Depending  on  the  tone  used,  the  syllable's 
meaning  changes.  In  the  English  transliteration,  used  in  most 
American  curricula,  diacritical  markings  above  the  vowels  indicate 
the  tone  to  be  used. 

In  multisyllabic  utterances  the  contours  of  the  tones 
associated  with  the  component  syllables  may  be  modified.  This 
is  called  "Tone  Sandhi."  An  example  is  the  "hal f-third-tone, " 
a  low  and  steady  variant  of  the  normally  low-scooping  isolated 

third  tone.  The  half-third  occurs  in  word-initial  position. 

Another  example  is  the  "neutral"  tone  for  unstressed  syllables. 

It  corresponds  roughly  to  the  "schwa"  vowel  in  English.  Its 
pitch  contour  is  strongly  dependent  on  the  tone  of  the  preceding 
stressed  syllable.  Relations  between  adjacent  tones  arc  often 
complex,  and  much  drilling  is  required  before  the  proper  com¬ 
binatory  behavior  is  achieved. 

The  aspirave  and  unaspirate  voiceless  initial  stops  in 
Chinese  differ  from  their  counterparts  in  English.  For  example, 
the  aspirate  /p/  in  "pill"  is  produced  by  emitting  a  puff  or  air 
(aspiration)  prior  to  the  onset  of  voicing.  The  corresponding 
Chinese  aspirate  initial,  while  it  may  be  transliterated  similarly 
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differs  in  that  the  puff  of  air  is  emitted  with  noticeably  more 
force.  The  unaspirate  opposite  of  /p/  ii  /b/,  and  in  English 
this  is  produced  by  beginning  voicing  prior  to  or  coincident  with 
the  opening  of  the  lips,  with  the  amount  of  air  puffed  at  the 
'.ment  of  opening  much  smaller  than  /p/ .  The  corresponding  Chinese 
unaspirate  initial  begins  the  voicing  in  exact  coincidence  with 
the  parting  of  the  lips,  with  the  intraoral  pressure  buildup  at 
a  minimum.  The  free  variation  in  voice  onset  time  for  English 
but  not  Chinese  may  lead  to  confusion  in  the  student  between 
Chinese  versions  of  /p/  and  / b / . 

There  are  four  basic  contrasts  grouped  as  aspirate/unaspirate 
voiceless  initials,  depending  on  place  of  articulation.  /p/  - 
/b/  was  described  above.  The  second,  /t/  -  /d/,  is  the  labiodental 
contrast,  with  the  emphasized  aspiration  of  /t/  and  the  minimized 
aspirate,  nonprevoiced  /d/.  The  third  is  /g/  -  /k/,  glottal, 
with  the  /k/  produced  in  a  manner  easily  confused  by  the  student 
with  the  English  /g/.  The  last  ontrast  is  transliterated  "c  -  z," 
with  no  direct  English  equivalent.  The  "t's"-like  sound  of  fric¬ 
tion  is  emphasized  in  the  "c"  and  it  occurs  before  the  voicing 

onset  of  the  following  vowel.  In  the  "z"  sound,  voicing  occurs 
earlier,  but  not  before  release. 

2.  METHOD 

2.1  Selection  and  Pretesting 

English  speaking  students  of  basic  Mandarin  Chinese  were  re¬ 
cruited  from  the  introductory  Mandarin  Chinese  courses  at  Harvard 
University  and  Massachusetts  Institute  of  Technology.  Brief 
presentations  were  made  in  regular  classes  to  explain  the  purpose 
and  pay  scale  of  the  experiment.  All  14  volunteers  were  accepted 
into  the  study,  half  as  experimentals ,  half  as  controls. 
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A  test  list  of  utterances  was  compiled  to  be  administered 
to  both  groups  three  times.  The  first  was  a  pretest  given  before 
training.  The  second  was  a  post  test  immediately  following  the 
training  of  the  experimental,  and  the  third  a  retention  test 
after  a  no-treatment  interval  for  both  groups. 

For  each  of  these  lists  the  students  read  a  series  of  24 
two-syllable  word  pairs  and  phrases,  under  conditions  controlled 
by  a  simple  set  of  instructions  read  from  the  display  screen  of 
the  API  system.  A  tape  recording  was  made  of  their  speech. 

Table  1  gives  the  list  of  utterances  produced.  The  four  sec¬ 
tions  indicated  on  this  list  reflect  the  four  segments  of 
training  administered  to  the  experimental  students.  There  were 
six  utterances  comprised  of  minimal  pairs  of  single,  irrlited 
tones.  This  section  thus  tested  production  of  unencumbered  tone 
gestures.  There  were  six  disyllabic,  two-tone  utterances,  testing 
for  the  proper  combination  of  tones  and  including  several  words 
where  tone  sandhi  radically  alters  the  rendition  of  a  component. 
The  next  six  tested  utterances  were  also  disyllabic,  but  the 
second  member  was  the  so-called  "neutral  tone,"  The  final  six 
utterances  were  minimal  pairs  differing  not  in  tone  but  in  the 
initial  stop. 

Both  groups  were  given  the  pretest.  A  teacher  of  Mandarin 
Chinese,  who  had  recorded  the  API  training  tapes,  listened  to 
the  tapes  of  their  speech  and  rated  all  the  utterances  of  all 
the  volunteer  subjects.  An  informal  attempt  was  then  made,  with 
his  help,  to  divide  the  subjects  by  proficiency  equally  into  the 
experimental  and  control  groups.  Within  each  group  there  was  a 
great  deal  of  variance  in  pronunciation  abilities. 

2.2  Training 

A  curriculum  was  prepared  in  consultation  with  faculty 
teaching  the  Introductory  Mandarin  Chinese  courses  at  Harvard 
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TABLE  1.  TEST  LIST 
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FA 
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HOU 

FEI 

3-2 

9. 

V 

MA 

FANG 

4-4 

10. 

YA 

LAN 

> 

\ 

2-3 

11  . 

LONG  YA 

\» 

V 

3-3 

12. 

MA 

YI 

VARIOUS 

1- 

13. 

TA 

.LE 

TONES  FOLLOWED 

*  ' 

BY  NEUTRAL 

3- 

14. 

YAO 

.LE 

TONE 

> 

2- 

15. 

HU 

.LE 

'V 

3- 

16. 

TAO 

.DE 

4- 

17. 

LA 

.DE 

4- 

13. 

YAO 

.LE 

ASPIRATED- 

B-P 

1 

19. 

BET 

PEI 

UNASPIRATED 

VOICELESS 

T-D 

1 

20. 

TU 

DU 

INITIALS, 

VARIOUS 

T-D 

4 

21  . 

DA 

TA 

TONES 

K-G 

1 

22. 

KAI 

GAI 

\. 

v 

Z-C 

3 

23. 

ZAO 

CAO 

Z-C 

1 

24. 

CAI 

ZAI 

V 


4? 
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University  and  Massachusetts  Institute  of  Technology.  The  goal 
of  this  effort  was  a  set  of  materials  that  would  supplement  normal 
course  work  for  the  students.  The  same  orthographic  system  as 
used  in  the  students'  textbooks  was  implemented  on  the  API.  The 
chosen  subset  of  the  pronunciation  problems  they  faced  was  pre¬ 
sented  in  the  same  manner  as  in  the  standard  language  laboratory 
materials  available  to  all  students.  Since  it  was  impossible  to 
provide  supplemental  non-API  training  to  the  control  group,  it 
was  important  that  they  have  access  to  similar  materials  in  the 
parent  course.  The  control  group  received  no  special  treatment 
save  the  encouragement  to  utilize  the  language  laboratory  cur¬ 
riculum  that  was  equally  available  to  both  experimenal  and  con¬ 
trol  students. 

The  seven  experimental  students  each  underwent  eight 
training  sessions  on  the  API  system.  Each  session  involved  from 
35  to  45  minutes  of  training  time  without  monitor  intervention. 


Sessions  1  and  2:  Isolated  identical  tones.  The  first 
exposure  of  experimental  students  to  the  system  was  done  with  the 
simplest  possible  element  of  the  curriculum.  Each  of  the  five 
tones  was  represented  by  four  or  five  items  in  the  24-stimulus 
wordlist  shown  in  Table  2.  As  in  the  parent  course,  the  half-third 
tone  was  considered  a  separate  entity  in  early  training  even  though 
it  never  occurs  in  isolation.  Each  training  utterance  consists 
of  two  differ!  g  "carrier  syllables"  with  the  same  tone  on  each 
syllable . 

The  speech  function  displayed  was  pitch.  A  few  minutes  of 
the  first  session  were  devoted  to  instruction  in  the  manipulation 
of  the  equipment  and  in  the  interpretation  of  the  display.  The 
monitor  soon  left  the  students  to  their  own  devices  and  only 
needed  to  provide  occasional  further  help. 

The  operation  of  the  computer  programs  had  been  made  flexible 
to  allow  variation  in  the  possible  approaches  to  different  problems. 
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TABLE  2:  TRAINING  LIST  1:  ISOLATED  IDENTICAL  TONES 


JT'jC.iJMIilATTOM 

arT^.tAric*: 

F'J.JCnON  DT3.LAIiiD 

PITCH: 

SLIDING  MATCH. 

/  1  . 

A 

MA 

THEN 

VERTICAL  PAIR 

YI 

FEI 

MATCH 

TONE  1  { 

J. 

TANG 

FA 

4. 

YA 

AI 

<  5- 

LAO 

MAO 

> 

> 

f  6. 

A 

MA 

1  *7# 

YAO 

MI 

TONE  2 

- 

< 

■3. 

FAN 

MAN 

9. 

YI 

FF.I 

10. 

> 

FA 

YA 

\ 

s/ 

IK 

A- 

MA¬ 

S. 

12. 

AI- 

YA - 

1/2  TONE  3  < 

. 

V 

1J. 

FEI- 

YI- 

V 

'  14. 

MAO- 

YAO- 

15. 

A 

MA 

16. 

AI 

iJ\o 

TONE  4 

- 

1 

12. 

FA 

MI 

13. 

MAN 

WAN 

V  i?. 

FEI 

MAO 

N, 

20. 

A 

MA 

V. 

V. 

21. 

YI 

AI 

TONE  3 

??. 

MI 

FEI 

?3. 

LAO 

MAO 

i 

\ 

V 

\  ?U. 

YA 

FA 
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Many  of  the  training  s  sions  required  different  types  of  compari¬ 
sons,  and  so  the  display  procedures  were  altered  to  maximize  the 
visual  discriminabi 1 i ty  of  the  relevant  parameters.  The  basic 
framework  of  the  display,  constant  throughout,  contained  space 
for  one  or  two  teacher  utterances  and  one  or  two  student  utterances . 
The  student  could  match  his  utterances  with  those  of  the  teacher, 
or  could  match  his  second  with  his  first  word,  depending  on  what 
was  being  trained. 

The  major  lesson  to  be  learned  in  the  first  session  was 
consistency  in  the  production  of  tones.  To  aid  the  students, 
the  software  operated  a  ''Match"  function  in  "sliding  mode." 

When  the  Match  button  was  depressed,  the  second  member  of  both 
the  student's  and  teacher's  pair  of  word  traces  described  a 
smooth  leftward  motion  until  its  first  point  net  the  first 
word's  starting  point.  In  the  second  training  session  "vertical 
pair"  K.atch  was  used.  While  it  was  not  strictly  necessary  in 
the  context  of  the  first  training  word  list,  it  served  as  a 
simple  introduction  to  the  idea  of  inte r- speaker  comparison,  used 
later.  The  two  student  word  traces  were  each  moved  up  sm  othly 
until  each  one's  starting  point  was  at  the  same  horizontal  posi¬ 
tion  as  the  corresponding  teacher  word.  The  student  was  instructed 
to  attend  to  the  parallelism  between  is  trace  and  the  teacher's, 
and  to  disregard  absolute  differences  in  fundamental  frequency. 

The  logarithmic  nature  of  the  pitch  display  facilitated  this. 

Sessions  3  and  4:  Isolated  tones  in  differing  minimal  pairs. 
Each  of  the  possible  tone  pairs  was  presented,  including  pairs 
with  the  half-third  tone  as  both  the  first  and  second  member  of 
the  two-syllable  utterance.  Table  3  shows  the  word  list.  Since 
the  two  components  of  the  minimal  pair  are  pronounced  as  separate 
words,  this  utterance  is  not  normal  in  spoken  Mandarin,  but  again 
had  been  used  in  the  parent  course  work.  Two  sessions  were  de¬ 
voted  to  this  wordlist.  The  first  of  them  used  sliding  mode 
Match,  so  that  the  students  could  concentrate  on  producing  the 
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TABLE  3:  LIST  2:  ISOLATED  DIFFERENT  TONES 


[ 

DI  jCSIMIilATTON 

li 


1 

1 .  A 

A 

A 

*■  1-1/2 

3 

2.  MA 

V 

MA- 

I 

3.  yT 

V' 

YI 

1-4 

4-  fEi 

FEI 

1  2-1 

5-  FA 

FA 

g  2-1/2 

3 

a*  YA 

YA- 

* 

?•  AI 

AI 

- 

|  2-4 

3.  LAO 

LAO 

■ 

1/2  3  - 

1 

9.  FAN- 

FAN 

i  1/2  3  • 

2 

10.  MAN- 

* 

MAN 

1/2  3- 

3 

1 1 .  MI- 

v 

MI 

i  1/2  3  - 

4 

12.  MAO- 

K 

MAO 

I  3  "  1 

V 

1 3.  A 

A 

3-2 

14.  :ia 

MA. 

[  3-1/2 

3 

1 5.  YI 

YI- 

r 

V 

10.  FEI 

FEI 

1  4-1 

1?.  FA 

FA 

j  4-2 

13.  YA 

YA 

4  -  1/2 

3 

19.  AI* 

v 

AI- 

j  4-3 

20.  LAO 

■v 

LAO 

1  -  4 

21 .  fan 

FAN 

i  2-3 

22.  MAN 

V 

MAN 

!  1-2 

23.  mT 

> 

MI 

24.  MAO 

MAO 

F'JiJCnO.V  DTor-lAYiiD 

PITCH : 

SLIDING  MATCH, 

THFN 

VERTICAL 

PAIR 

MATCH 
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different  tones  in  the  proper  pitch  relations  to  each  other. 
Session  4  addressed  the  problem  of  timing  (ton?  duration)  and 
used  verti ca 1 - pa i r  Match  mode.  Students  could  still  make  intra¬ 
speaker  comparisons  of  trace  shape  as  well,  because  they  could 
compare  their  own  two  traces  with  the  teacher's  even  before 
using  the  Match  operation. 

Sessions  5  and  6:  Two-tone  combinations.  Tables  4  and  5 
contain  the  two  word  lists,  each  of  which  taps  many  of  the  pos¬ 
sible  two-syllable  tone  combinations.  Doubled  tones  are 
included  since  tone  sandhi  is  often  a  factor.  Difficult  combin¬ 
ations,  such  as  those  involving  special  tones  (such  as  half-third 
or  half-falling  fourth)  are  emphasized  by  repetition. 

Students  worked  on  the  above  two  lists  for  one  session  each. 
The  matching  mode  used  for  this  material  is  called  "vertical 
phrase,"  signifying  that  the  entire  student  utterance  is  trans¬ 
lated  vertically  without  subdivision  to  superimpose  on  the 
teacher's  entire  utterance. 

At  this  point  in  the  training,  the  schedule  underwent  a 
forced  modification  The  experiment  was  being  conducted  during 
the  Fall  semester.  The  planned  termination  of  the  training 
sessions  had  been  quite  close  to  the  Christinas  holidays.  However 
earlier  departures  were  unexpectedly  planned  by  at  least  two 
experimental  students,  forcing  the  premature  termination  of  the 
training  for  the  entire  group.  A  decision  was  made  to  train  the 
last  two  word  1 i s  ts  with  one  session  each,  rather  than  abandon 
either  entirely.  Such  are  the  limitations  encountered  in  a 
s emi - vol untary  setting. 
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Session  7:  Neutral  tone  following  each  of  the  four  tones. 

This  single  training  session  used  the  wordlist  shown  in  Table  6. 

A  syllable  written  with  no  diacritical  tone  marker  over  its  vowel 
and  preceded  by  a  period  is  pronounced  with  an  unstressed 
neutral  tone  whose  duration  and  contour  depends  on  the  preceding 
stressed  tone.  When  the  third  tone  precedes  the  neutral,  its 
production  shifts  to  the  half-third.  As  in  the  other  two-tone 
combinations,  vertical  phrase  matching  was  used. 

Session  8:  Aspirate/unaspirate  voiceless  initial  stops. 

The  word  list  shown  in  Table  7  was  used  in  the  last  training  ses¬ 
sion.  Each  of  the  four  contrasting  consonant  pairs  is  represented 
by  a  group  of  six  minimal  pair  items  in  this  list.  Successive 
items  reversed  the  direction  of  this  discrimination;  i.e.,  if 
one  training  item  has  the  aspirate  member  of  the  pair  first,  the 
succeeding  minimal  pair  will  have  the  unaspirate  member  first. 

Tone  within  an  item  was  constant,  and  an  effort  was  made  to  have 
all  tones  represented  in  each  of  the  four  categories. 

The  display  used  for  this  material  gave  feedback  principally 
on  the  presence  and  time  course  of  speech  noise  produced  before 
voicing  onset.  Both  voice  pitch  and  overall  loudness  of  the  speech 
were  plotted  as  a  composite  during  voiced  sections  of  utterances: 
for  voiced  sections  of  speech,  the  familiar  pitch  trace  appeared 
as  before,  but  added  above  it  was  a  set  of  dimmer  points  at  a 
distance  above  the  pitch  trace  proportional  to  the  loudness  of  the 
voiced  speech  sound.  Unvoiced  speech  sounds,  which  formerly  (in 
earlier  displays  used  by  the  students)  had  produced  no  visual 
feedback,  now  produced  a  single  line  near  the  bottom  of  the  display 
at  vertical  positions  proportional  to  the  loudness  of  the  unvoiced 
sound  at  that  point  in  time.  The  distinction  between  voiced  and 
unvoiced  sounds  was  thereby  made  clear  to  the  speaker,  and  he  was 
to  use  the  information  in  evaluating  the  relations  between  voiced 
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TABLE  4.  LIST  3a:  2-TONE  COMBINATIONS 


DI -3C.il:  i 

IiJATIOM 

.f 

JTTSKANC3 

FUMCflQM  OTSPLAIiSD 

PITCH, 

VERTICAL 

1 

1 

1  . 

TA  TING 

PHRASE 

MATCH 

1 

- 

2 

2. 

TA  LAI 

n 

_  V 

l| 

1 

3 

3. 

TA  MAI 

1 

- 

4 

4. 

TA  MAI 

[j 

2 

- 

1 

5. 

MF.I  TING 

2 

- 

2 

6. 

mei  lai 

0 

2 

— 

3 

n  # 

MEI  MAI 

r“> 
j  j 

2 

4 

’  > 

w  • 

MEI  MAI 

V 

3 

1 

•?. 

NI  TlNC 

Af  i 

3 

1 

10. 

MA  AN 

V  ^ 

/fW  1 

3 

*“ 

2 

11. 

NI  LAI 

* 

\ 'O  / 

3 

— 

2 

12. 

HAO  WAIJR 

t 

3 

— 

3 

13. 

NI  MAI 

41 

v  v 

3 

“ 

3 

14. 

LAO  HU 

1 

V  >v' 

1 

3 

— 

3 

15. 

MEI  MAN 

v  * 

1 

3 

— 

4 

1 6. 

NI  MAI 

1 

v  •* 

3 

4 

1?. 

MA  LU 

"I 

4 

- 

1 

13. 

YAO  TING 

J 

4 

- 

2 

19. 

YAO  LAI 

\\ 

4 

- 

3 

20. 

YAO  MAI 

•J 

*»  N' 

n 

4 

- 

3 

21. 

TAI  LAO 

! 

4 

“ 

4 

22. 

YAO  MAI 

r) 

4 

- 

4 

23. 

HAO  WAI 

( 

*.  ^ 

4 

4 

24. 

YAO  FAN 

14 

— — 

- - - r., 

L  J 
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TABLE  5:  LIST  3b:  2-TONE  COMBINATIONS 


DI  3C.iI:IIiJATIO:/ 

* 

am.<Arjc 

1-1 

1 . 

SAM  FAN 

1-2 

o 

t  • 

A 

yj  lu 

1-1 

j. 

TA  IIAN 

1-4 

h. 

FA  LINO 

2-1 

5. 

LI  MAO 

2-2 

J. 

YAO  LIIIC 

2-3 

n  m 

HAI  HAO 

2-4 

■i. 

FU  LI 

3-1 

9. 

LAO  MAO 

3-1 

10. 

TING  HFI 

3-2 

11 . 

v  .. 

MAN  TAN  0 

3-2 

1.?. 

LIANG  PING 

3-3 

1J. 

s/  V 

TU  FEI 

3-3 

n. 

V 

MEI  MAN 

3-3 

15. 

V  V 

LAO  HU 

3-4 

1  n. 

v  - 
LI  FA 

3-4 

1?. 

\  ^ 

LAO  HUA 

4-1 

13. 

TAI  IIUA 

4-2 

19. 

SU  LAI 

4-3 

20. 

-  V/ 

FU  MA 

4-3 

21 . 

*•  V/ 

LI  FA 

4-4 

??. 

MU  TAM 

4-4 

PJ. 

YAO  FAN 

4  -4 

?U. 

HAO  WAI 

V 


f'j.jc  new  nr3?LA’fiiD 

PITCH: 
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PHRASE 
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:] 
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TABLE  6:  LIST  4:  NEUTRAL  TONE 


DT  JC.ilMINATIOW 

* 

UrTS;<Arj: 

1 . 

TING  .LE 

1  l 

2. 

SAN  .  GE 

:j. 

FEI  .DE 

I  4* 

LAI  .LE 

2  < 

, 

> 

YI  .GE 

!  *. 

PA  .DE 

*7# 

.MAI  .LE 

1/2  3  , 

WU  .GE 

u 

PAO  .DE 

1 

10. 

MAI  . LE 

4  ( 

ii. 

LIU  .GE 

1?. 

TIAO  .DF. 

I  13’ 

MA  .MA 

1 

14. 

TA  .DE 

1 

1  15. 

FAN  .LE 

j  1->. 

LI  .BA 

2  ( 

1  1?* 

MAI  .LE 

[  13. 

LAN  .DE 

'  19. 

HAO  .BA 

1/2  3  ( 

FAN  .DE 

21 . 

SAO  .LE 

| 

22. 

HAI  .TA 

4  / 

2}. 

LEI  .LE 

1 

24. 

«• 

SU  .DE 
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PITCH : 
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] 

i 
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TABLE  7:  LIST  S:  ASPIRATE/UNASPIRATE  VOICELESS  INITIALS 


31  >C  vLILiATTOV 

t 

an.vjv^r: 

tone 

*  4 

1. 

BFNG 

PFNG 

1 

o 

PAN 

BAN 

B-P  2 

-«• 

BAI 

PAI 

\ 

3 

PAO 

BAO 

4 

5. 

BU 

PU 

3 

\  6. 

V 

PI  AO 

PI  AO 

4 

(  •* 

DU  I 

TUI 

1 

Tf 

DT 

1 

7. 

DING 

TING 

D-T  3  ' 

10. 

TAO 

DAO 

4 

11  . 

DANG 

TANG 

3 

V  ,’* 

TONG 

DONG 

« 

f  1J. 

GAN 

KAN 

1 

1'*. 

KANG 

GANG 

G-K  3  < 

13. 

\ 

GU 

KU 

3 

1m. 

KL'AI 

GUAI 

3 

1?. 

GONG 

KONG 

4 

V  H. 

KAU 

CAU 

4 

(  *?• 

Zlfl 

CUI 

1 

"o. 

CAI 

ZAI 

Z-C  2  ( 

21  . 

ZAO 

CAO 

3 

CAN 

ZAN 

1 

1  2J* 

ZU 

CU 

1 

\  ?h. 

cXng 

zKng 

fj.icito»  otj.-ia'.o 
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and  unvoiced  consonants  along  the  lines  discussed  in  the  preceding 
phonological  introduction.  Students  reported  little  trouble  in 
using  the  display  for  conscnants,  and  some  reported  that  its  pitch 
feedback  served  as  a  good  review  for  simple  tone  production  they  had 
studied  previously. 

2.3  Post-  and  Retention-Testing 

Both  groups  of  students  were  post-tested  at  roughly  the  same 
time.  Both  groups  reaJ  the  same  list  of  24  test  utterances  they  had 
first  seen  at  pretest  time  and  following  the  same  procedure.  The 
material  in  the  testing  list  had  not  appeared  in  the  training  vord- 
lists  for  the  experimental  students,  and  it  had  been  seen  by  both 
groups  in  the  course  of  their  normal  language  laboratory  work. 

2.4  Evaluation  Procedures 

Each  student  had  recoroed  his  best  attempts  at  the  24  test 
utterances  at  three  points  in  time.  The  test  day  tape  recordings 
were  copied,  cut,  and  spliced  such  that  a  set  of  14  judgment  tapes, 
one  for  each  of  the  students,  was  prepared.  Each  judgment  tape 
began  with  the  student  reading  two  sample  English  sentences,  to 
enable  a  listening  judge  to  form  some  idea  of  the  normal  tone  of  the 
student's  voice.  Then  followed  four  similar  sections,  based  on  each 
of  the  four  segments  of  the  test  list.  First,  the  six  utterances  as 
read  by  the  native  Mandarin  teacher  were  heard.  Then,  separated  from 
each  other  by  approximately  five  seconds,  the  18  versions  of  the  six 
test  utterances  were  heard  in  a  scrambled  order  whose  only  constraint 
was  that  the  same  utterance's  three  versions  could  not  be  heard  in 
three  successive  positions.  No  identification  of  student  or  of  test¬ 
ing  day  was  contained  on  the  tape. 

Five  instructors  of  Introductory  Mandarin  Chinese  from  Boston 
area  universities,  all  Mandarin  natives,  served  as  paid  judges.  Each 
judge  worked  alone  in  the  API  student  room,  listening  to  the  14  judg¬ 
ment  tapes  played  over  the  student  loudspeaker  at  a  comfortable  lis¬ 
tening  level.  The  order  of  students  was  unique  for  each  judge,  and 
was  counterbalanced  to  compensate  for  increasing  familiarity  with  the 
judgment  task.  Two  15-minute  rest  periods  were  interposed  within  the 
approximately  4-hour  course  of  each  judge's  ratings. 
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Written  instructions  (included  in  Appendix  1)  for  the  judges 
explained  the  rating  scale  they  were  to  apply.  Each  test  utter¬ 
ance  was  to  be  assigned  an  integer  number  from  0  to  4,  higher 
numbers  associated  with  better  performance. 

To  aid  them  further  in  their  task,  each  judge  had  a  short- 
form  rating  instruction  sheet  (Appendix  2)  and,  for  each  judgment 
tape,  an  actual  script  of  the  order  of  the  utterances  (a  sample 
is  shown  in  Appendix  3).  This  "answer  sheet"  did  not,  of  course, 
identify  either  student  or  test  day,  but  it  did  serve  to  inform 
the  judge  of  what  test  utterance  the  student  was  in  fact  at¬ 
tempting.  This  was  particularly  valuable  in  cases  of  gross 
student  error.  Three  orthographic  systems  were  used  in  identify¬ 
ing  the  test  utterances  on  the  judges'  sheets,  so  that  they  could 
utilize  the  most  familiar  one.  Judges  wrote  their  accent  ratings 
in  a  blank  following  each  line  of  the  answer  sheet. 

Judges  could  ask  the  assistant  to  stop  or  replay  the  tape  to 
give  them  more  time  to  come  to  a  decision,  but  these  requests 
diminished  over  time  and  the  data  were  gathered  without  incident. 

Since  both  groups  of  students  were  part  of  a  larger-scope 
course  in  basic  Mandarin  Chinese,  it  was  expected  that  their 
overall  Chinese  speech  quality  would  improve  through  time,  ir¬ 
respective  of  their  status  within  the  experiment.  The  central 
question  addressed  to  the  data  was,  therefore,  whether  there  was 
a  differential  improvement  between  the  students  using  the  API 
system  and  the  group  not. 

The  word  lists  used  for  testing  were  divided  by  the  four 
types  of  training  materials:  separate  single  tones,  disyllabic 
combinations,  neutral  tone  disyllables,  and  consonant  contrasts. 
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A  measure  of  the  student's  performance  for  each  of  the  four  sec¬ 
tions  of  the  test  lists,  for  each  test  day,  over  all  judges  was 
obtained.  Then  the  data  for  all  experimental  subjects  were 
combined  and  compared  with  all  the  controls. 

A  judgment  is  defined  as  the  score  given  by  one  judge  to 
one  word  spoken  by  one  subject  on  one  test  day.  Comparing,  for 
example,  pre  and  post  judgments  of  one  word,  a  subject  could 
receive  a  higher  post  score,  a  lower  post  score  or  the 
same  score.  If  he  received  a  higher  post  score,  he  improved 
his  pronunciation  of  that  word  from  the  pretest  to  the  post 
test  according  to  the  judge.  Two  comparisons  were  made:  pre  vs. 
post  tests  and  pre  vs.  retention  tests. 

3.  RESULTS 

Considering  first  the  pre  vs  the  post  tests,  over  half  the 
judgments  made  by  all  judges,  for  all  the  subjects  and  all  the 
words  showed  no  change  in  pronunciation  ability.  For  the  exper- 
imentals,  58  percent,  and  for  the  controls  62  percent  of  all 
judgments  made  on  the  pre-test  words  did  not  change  on  the  post¬ 
test.  Of  the  judgments  that  dii  change,  the  experimental s  were 
more  likely  to  have  improved  than  the  controls,  while  the  controls 
were  about  equally  likely  to  have  scored  lower  as  higher  when 
changes  occurred  from  pre  to  post  tests.  Controls  improved  54  per¬ 
cent  of  the  time  when  they  changed,  while  the  experimental s  improved 
on  73  percent  of  all  changed  judgments.  This  difference  is  signif¬ 
icant  (pc.OOl) .  Table  8  gives  more  detail. 

The  pre-  vs  retention -test  comparisons  showed  similar 
trends.  The  experimental  subjects  retained  the  improvements  they 
had  made  on  the  post  tests.  The  controls,  who  showed  very 
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TABLE  8.  PRE-POST  TEST  COMPARISONS 
OVER  ALL  WORDS 


Total  number  of  judgments 
indicating  no  change 

Total  number  of  judgments 
indicating  improvement 

Total  number  of  judgments 
indicating  poorer  pronunciation 


Expe rimental s  Controls 


# 

* 

# 

% 

487 

58 

524 

62 

257 

31 

170 

20 

96 

1  1 

146 

18 

including  only  judgments 


indicating  change 


26.09 

df=l 
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little  average  change  from  the  pre  to  the  post  test  improved 
their  performance  on  retention.  Of  the  judgments  that  did  show 

*  change,  the  exper imen tal s  improved  on  73  percent  and  the  con¬ 
trols  66  percent.  There  was  still  a  large  number  of  judgments, 
in  both  groups,  that  showed  no  change  in  performance  from  pre  to 
retention  tests,  56  percent  of  all  judgments  for  the  experimenta 1 s 
and  67  percent  for  the  controls.  See  Table  9. 

The  distribution  of  "no  change"  judgments  was  even  over  all 
four  word  groups  for  experimental s  and  controls.  See  Table  10. 

The  controls  were  not  more  or  less  likely  than  the  experi mental s 
to  "not  change"  from  pre  to  post  tests  or  pre  to  retention  tests. 
None  of  the  four  stimulus  word  groups  was  more  or  less  likely 
than  any  other  to  show  changed  judgments. 

The  greatest  differential  improvement  of  experiment  a  1 s  over 
controls  occurred  on  the  first  group  of  the  stimulus  list,  the 
isolated  single  tones.  This  was  the  simplest  element  of  the 
curriculum.  The  subjects  ha*{  received  four  relevant  sessions  of 
training,  for  this  type  of  material.  Whether  th o  differences  in 
performance  arise  from  the  type  or  amount  of  training  given  or 
the  nature  of  the  stimulus  material  cannot  be  ascertained. 

However,  significantly  more  of  the  judgments  of  improvement 
occurred  among  the  experimental s  rather  than  the  controls. 

See  Table  11,  The  differences  between  experimental s  and 
controls  on  this  group  of  tones  also  remained  significant  on 
the  retention  test.  See  Table  12. 

The  second  and  third  test  word  groups  were  the  disyllabic 
combinations  and  the  neutral  tone  disyllables.  The  experimental s 
consistently  had  a  greater  number  of  judgments  showing  improve¬ 
ment  than  the  controls,  over  both  word  groups  and  on  the  pre-post 
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TABLE  9.  PRE-RETENTION  TEST  COMPARISONS 
OVER  ALL  WORDS 


E xpe rimen ta 1 s  Controls 


# 

\ 

t 

4 

Total  number  of  judgments 
indicating  no  change 

469 

56 

563 

67 

Total  number  of  judgments 
indicating  improvement 

272 

32 

184 

22 

Total  number  of  judgments 
indicating  poorer 
pronunciation 

99 

12 

93 

11 

X2  including  only  judgments  indicating  change  -  3.61 
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TABLE  11.  PRE-POST  TEST  COMPARISONS  BY  WORD 

GROUP 

1 

A  WORD  GROUP  1 

Experimental s 

Control s 

i 

Total  number  of  judgments  indicating 
•  improvement 

61 

23 

1  Total  number  of  judgments  indicating 

poorer  pronunciation 

32 

47 

I 

X2=l 7.13  df-1 

p< . 00  1 

|  WORD  GROUP  2 

Experimental s 

Con  tro  Is 

Total  number  of  judgments  indicating 
improvement 

78 

57 

Total  number  of  judgments  indicating 
poorer  pronunciation 

24 

28 

X2*  2 . 05  df-1 

p<  .  25 

1  l 

WORD  GROUP  3 

Experimental s 

56 

Controls 

Total  number  of  judgments  indicating 
improvement 

46 

Total  number  of  judgments  indicating 
poorer  pronunciation 

26 

34 

1 

X2-2.02  df-1 

p<  .  25 

WORD  GROUP  4 

| 

Experimental s 

Control s 

Total  number  of  judgments  indicating 
improvement 

62 

44 

Total  number  of  judgments  indicating 
poorer  pronunciation 

14 

37 

■ 

X2 - 1 3 . 2  8  df-1 

p< . 001 

I 
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TABLE  12.  PRE -RFTENTION  TEST  COMPARISONS  BY  WORD  GROUP 


WORD  GROUP  1 


Experimentals  Controls 


Total  number  of  judgments  indicating 
improvement 

Total  number  of  judgments  indicating 
poorer  pronunciation 


WORD  GROUP  2 


Total  number  of  judgments  indicating 
improvement 

Total  number  of  judgments  indicating 
poorer  pronunciation 


X  -30.45  df-1  p<.001 


Experimentals 


Control s 


X  -.63  df-1 


WORD  GROUP  3 


Experimentals 


Controls 


Total  number  of  judgments  indicating 
improvement 

Total  number  of  judgments  indicating 
poorer  pronunciation 


X  -.99  df-1 


WORD  GROUP  4 


Experimentals 


Contro Is 


Total  number  of  judgments  indicating 
improvement 

Total  number  of  judgments  indicating 
poorer  pronunciation 


X  -.06  df-1 
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and  pre- retention  comparisons.  The  contrast  between  the  experi¬ 
mental  and  the  controls  was  not  as  great  as  on  the  first  word 
group,  however.  Only  three  training  sessions  were  given  to  these 
tone  combinations,  both  using  inter- speaker  comparisons.  The  rate 
of  improvement  of  the  experimenta 1 s  was  about  the  same  as  on  the 
first  word  group;  the  controls  showed  more  important  than  the> 
had  on  the  single  tones,  and  the  differences  between  the  groups 
were  not  as  great. 

The  fourth  test  word  group  consisted  of  consonant  contrasts. 
On  this  set  of  words  the  experi men ta 1 s  improved  significantly 
over  the  controls  on  the  pre  to  post  test  but  not  on  the  pre¬ 
retention  test  comparison.  Of  the  judgments  that  did  indicate 
change,  the  experimental s  improved  on  82  percent  of  the  post¬ 
test  judgments  compared  with  54  percent  for  the  controls,  and  on 
73  percent  of  the  retention  judgments  compared  with  75  percent 
of  the  controls.  Aspirate  and  unaspirate  voiceless  initial 
stops  could  only  be  trained  for  one  session,  with  the  more  com¬ 
plex  pi tch - loudnes s  composite  display. 

4.  DISCUSSION 

Despite  the  severe  limits  in  the  breadth  of  the  student 
sample  and  in  the  time  available  for  training,  a  real  improve¬ 
ment  was  generally  observed  in  the  Chinese  speech  of  the  students 
exposed  to  the  API  system,  an  improvement  significantly  greater 
than  that  observed  in  students  tested  similarly  but  exposed  only 
to  the  "parent"  Chinese  course.  One  must  keep  the  limitations 
of  the  present  experiment  in  mind  when  assessing  the  performance 
of  the  API  system  in  this  situation.  Though  the  effects  observed 
were  small  statistically,  they  are  nonetheless  real,  and  their 
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size  is  probably  limited  more  by  the  scope  of  the  work  than  by  the 
efficacy  of  the  system.  To  have  observed  significant  treatment 
eff.cts  ir.  the  face  of  short  training  time  and  an  inherently 
"noisy"  evaluation  procedure  speaks  strongly  for  the  robustness 
of  that  treatment  effect. 

The  meaning  of  the  treatment  effect  should  be  evaluated  in 
light  of  two  opposed  factors.  On  the  one  hand,  the  test  list  was 
drawn  from  parent  course  materials,  so  that  both  experimental  and 
control  students  would  have  the  same  basic  familiarity  with  the 
utterances.  Furthermore,  materials  tested  had  not  been  included 
within  the  training  materials  used  by  the  experimental  students. 

Any  observed  treatment  effects  can  thus  be  ascribed  to  differential 
pronunciation  ability  rather  than  to  increased  familiarity  with  the 
testing  utterances.  On  the  other  hand,  the  sample  of  speech 
behavior  obtained  from  the  students  intentionally  included  only 
utterances  of  a  type  similar  to  those  trained,  so  that  any  possible 
treatment  effects  would  stand  out  in  sharp  relief. 

One  consequence  of  the  limited  scope  of  the  speech  behavior 
tested  is  the  restriction  on  the  inferences  that  may  be  drawn 
concerning  the  overall  pronunciation  abilities  of  the  experimental 
subjects.  This  was  done  with  the  realization  that  the  most 
sensitive  means  of  evaluation  could  be  applied  only  to  speech 
behaviors  easily  judged  and  reliably  produced.  The  primary 

hurdle  the  API  must  pass  is  a  demonstration  that  it  can 
produce  improvements  in  accent,  but  it  is  unrealistic  to  expect 
either  that  (a)  training  on  a  specific  set  of  accent  problems 
will  produce  an  across-the-board  improvement  in  pronunciation, 
or  (b)  that  a  panel  of  accent-rating  judges  can  make  reliable 
responses  concerning  anything  as  multidimensional  as  "total 
accentedness"  of  a  set  of  utterances.  The  evaluation  method 
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chosen,  and  the  statistical  procedure  used  to  reduce  the  data, 
were  therefore  designed  to  produce  maximal  sensitivity  to  change 
while  at  the  same  time  avoiding  the  more  complex  method  of  com¬ 
plete  pair-comparisons.  A  si ng le - s t imul us  rating  technique  by  a 
panel  of  judges  produced  responses  that  could  be  subjected  to  a 
pair-compari son- type  analysis,  if  due  regard  were  given  to  the 
permissible  operations  on  the  data.  As  it  happens,  one  is  in 
fact  interested  not  in  comparisons  between  specific  words  and 
subjects,  but  in  accent  parameters  (i.e.,  specific  word  groups), 
treatments  (i.e.,  experimental  or  control),  and  testing  times 
(pre-,  post-,  or  retention- testing  data).  The  present  analysis 
provides  answers  to  question s  posed  along  those  lines,  having 
minimized  the  variance  produced  by  both  the  speech  production  and 
subjective  judgment  processes. 

The  major  price  paid  in  the  analysis  is  the  large  number  of 
"no  change"  judgments  encountered.  These  result  largely  from 
the  coarse  grain  of  the  judgment  scale.  Taking  this  price  into 
account,  one  is  still  left  with  a  reasonable  statement  of  the 
null  hypothesis  as  regards  the  treatment  effect:  that  there  is 
no  difference  between  treatment  groups  in  the  distribution  of 
"improved"  versus  "poorer"  pronunciations.  That  hypothesis  fails 
of  acceptance  in  a  consistent  manner  throughout  the  above  analy¬ 
sis  . 

Tables  8  and  9  showed  that  the  number  of  equivocal  judgments 
for  all  test  words  was  smaller  for  experimental s  than  for  con¬ 
trols,  in  both  pre-post  and  pre-retention  comparisons.  Further¬ 
more,  it  was  shown  that  when  there  was  a  change,  it  was 
significantly  more  o^ten  in  the  direction  of  improvement  for  the 
experimental s  than  for  the  controls;  they  learned  more  and 
retained  it  better. 
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After  having  been  assured  by  Table  10  that  the  equivocal 
judgments  distribute  themselves  evenly  across  the  four  word 
groups,  it  becomes  reasonable  to  inspect  individual  word  groups' 
changed  judgments  for  differences  in  distribution  as  a  function 
of  treatment.  Again,  it  is  found  (in  Tables  11  and  12)  that  in 
each  word  group  and  for  both  pre-post  and  pre - re  tent  ion  compari¬ 
sons,  the  experimentals'  changes  are  always  in  the  direction  of 
greater  improvement,  and  significantly  so  in  three  out  of  the 
eight  specific  comparisons  made.  The  strong  showing  made  by 
word  group  1  is  not  surprising;  it  received  the  largest  share 
of  the  training  time,  and  was  conceptually  the  simplest  display. 
The  unexpectedly  strong  treatment  effect  observed  in  word  group 
4  is  most  easily  explained  by  the  action  of  the  pitch-loudness 
composite  display  used  there.  Even  though  the  training  time 
available  to  the  experimentals  for  this  work  group  was  but  one 
session,  they  apparently  profited  greatly  fro"  even  this  brief 
exposure  to  the  display.  Since  all  observed  effects  favored 
the  experimental  treatment,  it  is  reasonable  to  take  the  position 
that  a  simple  increase  in  training  time  might  have  brought  all 
differential  treatment  effects  to  significant  levels. 

At  this  writing,  the  final  field  tests  of  the  API  system 
are  underway  at  the  University  of  Miami's  Intensive  English 
Program,  Coral  Gables,  Florida,  with  Spanish  speakers  learning 
English.  This  experimentation  is  much  broader  in  scale.  Ex¬ 
perimental  variables  are  under  better  control  in  that  situation, 
and  the  scope  of  problems  trained  and  measurements  taken  is 
larger.  The  work  reported  above  gives  reason  for  optimism,  be¬ 
cause  even  when  the  system  is  tested  under  less  than  optimal 
conditions,  significant  benefits  accrue  to  its  students.  Sub¬ 
sequent  reports  in  this  series  will  describe  the  results  of  a 
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INSTRUCTIONS  TO  JUDGES 
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INSTRUCT! 0:i3  TO  JJ0GE3 


Tour  task  today  is  to  evaluate  Chinese  utterances  made  by  students  of 
Introductory  ilandarin  Chinese,  who  were  also  subjects  in  an  experiment  designed 
to  test  a  Chinese  pronunciation  teaching-machine.  Each  student  read  a  set  of  test 
words  at  various  times  throughout  the  experiment.  '.Ve  wish  to  find  out  whether 
the  students'  pronunciation  of  those  test  words  changed  over  time.  The  utterances 
have  been  randomly  scrambled  and  collected  onto  "judgment  tapes,"  one  judgment 
tape  for  each  student.  You  will  sit  alone  in  a  listening  room  and  you  will  assign 
a  numerical  grade  to  each  utterance  as  you  hear  it.  The  tape  contains  adequate 
time  for  you  to  consider  and  respond  to  each  item,  before  the  next  one  is  heard. 

If  you  need  additional  time,  or  if  you  want  to  pause  for  any  reason,  there  is 
a  microphone  connected  outside,  enabling  you  to  ask  the  operator  to  wait..  .Vhen 
you  are  ready  to  resume,  tell  him  and  things  will  proceed. 

Thare  are  two  booklets  to  aid  you  in  assigning  the  grades  to  the  students' 
utterances.  fhe  small,  four-page  booklet  is  the  key  to  what  the  utterances  are, 
and  to  how  the  grading  is  to  be  made.  .Each  page  corresponds  to  one  of  the  four 
sections  of  the  tape  Iron  each  student.  Each  section  deals  with  six  words  or 
word  pairs.  The  bottom  half  of  each  page  contains  transcribed  Ehglish  and  two 
Chinese  script  versions  of  the  six  words  that  have  been  scrambled  up  three  times 
to  form  one  13-utterance  section  of  each  student's  taoe.  The  top  half  of  each 
page  gives  a  brief  synopsis  of  the  grading  scheme  for  each  section.  (The  last 
part  of  these  instructions  will  give  you  detailed  information  on  how  to  grade 
each  section's  utterances;  for  now,  let  us  assume  that  you  will,  in  general, 
assign  each  utterance  a  grade  ranging  from  0  to  h.,  bad  to  good,  in  accordance 
with  the  instructions  and  with  your  judgment.) 


> 


The  thicker  booklet  is  your  key  to  the  utterances  themselves.  It  is, 
in  essence,  a  script  that  tells  you  what  word(s)  the  student  was  actually 
attempting  to  produce.  It  will  help  you  keep  your  place.  It  gives  you  a  blank 
space  within  which  you  are  to  write  your  judgment  of  each  utterance.  It  will 
be  especially  helpful  when  the  student's  version  of  the  intended  utterance  is 

garbled.  By  knowing  what  the  student  was  trying  to  say,  you  can  better  judge 

how  well  he  succeeded,  ilake  sure  that  each  line  receives  a  written  response  from 
you  ei tiier  0,  1 ,  2,  3,  or  4.  If  you  need  more  time  to  consider  your  judgment, 
just  as k  for  a  pause.  If  you  would  like  to  hear  any  utterance  over  again,  just 
ask  for  it. 

Here  is  a  view  of  what  the  judgment  procedure  is  'or  the  entire  session. 
There  will  be  15  judgment  tapes  played.  There  will  be  a  short  brea*  between  taoes. 
Each  tape  has  the  same  format  as  the  others.  The  first  voice  you  hear  will  not 
be  that  of  the  student  whose  utterances  are  collected  on  the  tane;  it  will  be 
an  identifier  for  the  tape  number.  Make  sure  that  it  corresponds  to  the  tape 
number  written  on  the  top  of  the  next  sheet  of  the  judgment  booklet.  If  it  does 
not,  tell  the  onerator,  because  the  script  will  then  not  agree  with  the  words  you 
hear.  At  the  start,  then,  the  first  page  of  the  judgment  booklet  corresDonds  to 

the  first  section  of  the  first  tape. 

After  you  have  correctly  identified  the  tape  number  and  assured  that  your 
judgment  booklet  is  on  the  right  page,  you  will  hear  the  student  for  'he  first 
time.  He  will  read  two  sentences:  "Joe  took  father's  shoe  bench  out."  and 
She  was  waiting  at  my  lawn."  rhese  sentences  are  merely  for  the  purpose  of 
acquainting  you  with  the  voice  of  the  student  before  each  tape  actually  begins. 
Through  these  introductory  sentences,  you  can  form  an  impression  of  his  or  her 
normal  tone  of  voice,  so  that  abnormal  tone  range  will  be  aooarent  from  the  first 
time  it  appears. 


Each  judgment  tape  then  continues  with  the  four  sections  of  18  scrambled 
utterances  of  the  student.  For  the  first  few  judgment  tapes,  the  operator  will 
precede  each  18-utterance  section  with  a  recording  of  a  Mandarin  speaKer  pronouncing 
the  six  utterances  in  the  order  given  on  the  bottom  of  the  four-page  booklet . 

This  is  to  familiarize  you  with  the  timing  of  the  utterances,  and  to  give  you 
an  example  of  the  type  of  pronunciation  that  the  students  were  attempting  to 
imitate.  As  you  become  more  experienced  in  listening  to  these  tapes,  you  will 
have  less  need  to  hear  the  introductory  Kandarin-native  introduction  to  each  of 
the  four  sections,  and  the  operator  will  skip  over  it.  If  you  want  to  hear  it, 
just  ask.  At  the  end  of  the  last  teacher-version,  there  is  a  10-second  pause, 
and  then  the  IS  utterances  of  the  student  will  be  heard.  You  will  respond  to 
each  of  them  by  placing  a  number  in  the  appropriate  blank  of  the  answer  sheet 
for  that  tape  and  section. 

And  now:  tfhat  do  those  numbers  mean?  How  are  you  to  decide?  First, 
remember  that  you  are  a  native  speaker  of  Mandarin,  and  you  will  have  an  instant 
opinion  of  each  of  the  utterances,  as  to  how  they  compare  to  your  internal  standard 
Your  teaching  experience,  and  some  knowledge  of  the  mechanism  of  speech  production  -- 
especially  for  tones  —  will  also  help  you  a  great  deal  in  assigning  judgments. 

The  utterances  you  will  judge  are  quite  short,  which  makes  your  job  easier 
since  there  are  fewer  aspects  of  each  utterance  that  you  need  to  consider  in 
making  your  judgment.  Also,  we  are  asking  you  to  disregard  certain  irrelevant 
aspects  of  the  students'  speech,  since  they  were  only  trained  in  the  production 
of  (in  sections  1,  2,  and  3)  proper  tones  and  (in  section  4)  proper  initial 
aspirate  and  unaspirate  stops.  The  top  line  of  the  four-page  handout  indicates 
what  was  trained  (i.e.,  what  to  pay  attention  to)  and  what  to  disregard  in  making 
your  judgments. 


3SCTI0H  I:  SeDaratfi  tones 

In  this  section,  as  in  all  the  rest,  if  the  student's  utterance  (for  the 
aoorooriate  aspects)  is  OK,  score  it  4.  If  it  is  less  than  OK,  think  of  the 
following  breakdown  of  his  performance.  There  are  two  words,  each  with  two 
asDects:  duration  (total  time  for  the  tone)  and  contour  (voice  pitch  as  a 
function  of  time).  Vhile  they  are  not  really  seoarable,  try  to  maxe  them  so 
for  the  present  purpose.  The  two  tones  are  also  produced  with  a  given  relative 
pitch  level.  If  you  can  pinpoint  just  one  error  in  the  two  word  utterance, 
score  it  J.  Possible  errors,  then:  (a)  one  tone  too  short  or  too  long,  (b) 
one  contour  off  slightly,  (c)  both  tones  UK  but  relative  pitch  wrong,  (d)  "just 
slightly  off  —  and  definitely  not  0K“  etc.  These  night  be  p's.  A  score  of  2 
would  be  as  indicated  in  the  handout,  and  the  remaining  grades  are  self-explanatory . 
If  the  preceding  sounds  too  comolicated,  remember  the  general  idea  and  assign 
the  grades  from  0  to  4  according  to  the  left-hand  side  of  the  grading  description 
on  the  handout:  4  for  OK,  unaccented  and  0  for  unacceptable,  a  total  miss. 

•iemember  that  the  two  words,  while  spoken  together,  are  not  really  oart 
of  a  complete  two-syllable  utterance.  There  only  point  of  relationship  is  in 
their  relative  levels.  The  amount  of  time  the  speaker  pauses  between  words  is 
irrelevant. 

OiCTION'  II:  Two-syllable  tone  pairs 

Here,  the  two  syllables  are  supposed  to  be  pronounced  together,  and  the 
linkage  between  them  is  a  subject  for  scrutiny.  Tne  durations  and  contours  of 
the  two  are  important,  the  manner  of  their  linkage  is  important,  and  the  existence 
of  tone  sandhi  is  very  important.  Again,  disregard  all  aspects  u  ”  th^  utterances 
except  the  tones. 


A  rating  of  4  signifies  that  the  utterance  is  OK,  unaccente-i.  live  a  j 
when  it  is  "almost  OK,"  but  do  not  count  as  j's  any  attempt  that  lacks  the  proper 
sandhi  (influence  by  syllable  2  on  syllable  1's  tone  structure).  Jive  a  2  to 
utterances  where  there  are  two  errors,  and  reserve  1  for  sounds  that  are  "better 
than  nothing" or  which  are  two  tones  lacking  oroper  sandhi  when  approoriate. 

Give  0  to  bad  tries.  As  before,  the  general  ordering  from  "4-  OK"  through 

"0  -  bad"  is  an  alternative  mode  of  consideration  for  the  judgments  in  this  section. 

SECTION  III:  Neutral  tone  as  second  member  of  two-syllable  tone  pairs 

Jse  the  same  general  approach  as  in  Section  II.  The  second  syllable,  the 
neutral  tone,  is  short  and  doesn't  have  much  contour,  but  its  linkage  to  syllable 
one,  and  its  sandhi  upon  syllable  one,  are  of  great  interest. 

SECTION  IV:  Aspirated  and  unaspirated  initial  consonants 

Here,  you  are  to  try  to  disregard  vowels  and  tones,  and  concentrate  your 
attention  on  how  well  the  sneaker  produces  the  consonants.  The  six  word  pairs 
alternate  in  which  member  of  the  pair  is  aspirated  and  which  not.  Each  initial 
consonant  has  two  general  aspects:  Voice-onset  time  and  voice  quality.  The 
aspirate  stops  should  exhibit  the  right  sound  of  friction  for  the  right  amount  of 
time  before  the  vowel  begins.  The  unaspirate  stops  should  have  a  far  shorter 
period  of  friction  before  the  vowel,  and  they  too  should  sound  correct  during 
the  consonant  portion.  As  you  know,  unaspirate  stops  must  not  be  prevoiced  in 
Mandarin.  Follow  the  handout  in  assigning  grades  to  these  utterances.  For 

example,  give  a  3  to  a  word  pair  where  one  word  is  OK  and  the  other  has  one  of 
thr  above  errors. 


GENEdAL  COMMENTS: 


We  realize  that  we  are  asking  you  to  do  a  difficult  task,  fa  realize 
further  that  your  grades  may  change  over  time.  The  purpose  of  the  above  standards 

4 

is  to  provide  you  with  some  sort  of  absolute  yardstick,  but  invariability  is 
ha *d  to  come  by  in  human  judgments.  We  realize  this  too,  and  have  allowed  for 
it;  so  just  try  to  do  as  well  and  as  consistently  as  you  can. 

We  exoect  that  you  will  work  as  carefully  and  as  conscientiously  as 
possible.  Much  hangs  in  the  balance  in  this  experiment,  anti  so  we  wish  you  to 
consider  your  judgments  as  carefully  as  possible  within  the  time  available, 
demem-ar  that  you  are  being  paid  about  5/  per  judgment,  and  try  to  provide  your 
full  attention  to  each  utterance,  disregarding  any  extraneous  sounds  that  may 
have  remained  on  the  judgment  tapes. 

There  will  be  speakers  whose  performance  is  better  than  others.  Try  not 
to  let  your  scale  become  relative  only  to  the  present  soeaker,  sliding  up  and 
down  to  match  the  level  of  each  speaker.  Try  to  remain  unmoved  by  swings  in 
ability,  but  to  judge  each  speaker  and  indeed  each  utterance  as  an  independent 
event .  Your  increasing  experience  in  this  judgment  situation  may  cause  some 
shifts  through  the  entire  session;  don't  become  overly  concerned  with  this. 

If  you  follow  the  general  guidelines,  that  is  enough  for  our  purposes.  Dbn't 
try  to  artificially  distinguish  between  performances  that  are  only  slightly 
different.  The  categories  are  fairly  broad,  and  a  given  level  of  grading  can 
encompass  utterances  that  differ. 

What  we  are  saying  is:  Try  your  best  to  give  us  a  frank  imoression  of  how 
well  each  speaker  produces  each  utterance  —  the  better  the  performance,  the 
higher  the  score.  If  you  follow  the  strategy  outlined  above,  we  will  be  satisfied. 

3Y  ALL  MEANS  ASK  ANY  QUESTIONS  YOU  WISH,  NOW  O.i  AT  ANY  TIME  DUdING  THE  SESSION. 


i 
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'  APPENDIX  2 

SHORT-FORM  RATING  INSTRUCTION  SHEET 
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Section  I:  Separate  tones 


(disregard  vowels  and  consonants) 


WORD  ON  S 


WORD  TWO 


Duration 

Contour 


Relative  Level 


Duration 

Contour 


4:  Unaccented  4 

3:  0 

2:  2 

Is  ♦  1 

0:  Unacceptable  0 


All  above  points 
One  error  above 

"Half-credit";  One  error  in  one  tone, 
relative  level  wrong. 

"Setter  than  nothing;"  One  word  OK,  the 
other  wrong. 

Total  fiiss 


- 

/ 
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fa 
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Y 
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«/ 
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Section  II:  Two-syllable  tone  pairs 


(disregard  vowels  and  consonants) 


Syllable  One 

IXiration 

Contour 

(Proper  influence 


iielative  Level 
if  Syllable  Two) 


Syllable  Two 

Duration 

Contour 


4:  Jnaccented 


4:  All  above  points 


0 


Unacceptable 


J:  "Almost  OK;"  One  error  above,  except 

proper  "two-on-one"  influence. 

2:  "Half-credit" 

1:  "Better  than  nothing;"  e.g.:  no 

"two-on-one"  influence,  etc. 

0:  Nothing 


7. 

■fa-  >vn  n^ 
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1. 
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ya-  lan 

</• 
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v  V 
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(Disregard  vowels  and  tones) 


Section  IV:  Asoirated  and  unaspirated 
initial  consonants 


UNASPIdATED  WOdD  (3,D,G) 

No  prevoicing,  proper  voice-onset  time 
Proper  spectral  quality 


ASPIdATSD  NOdO  (P,T,K) 
Proper  voice-onset  time 
Proper  soectral  quality 


4:  Unaccented  4 

3:  3 

2:  2 

1  :  1 

0:  Unacceptable  0 


All  above  ooints 

One  word  not  quite  OK 

"Half-credit;"  both  not  quite  OK,  or 
one  word  wrong 
"Better  than  nothing" 

No thine 
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bu 
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V/ 
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H 

AY. 

cai 

zai 
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APPENDIX  3 

SAMPLE  SCRIPT  GIVING  THE  ORDER 
OF  UTTERANCES  IN  ONE  SUBJECT'S  JUDGMENT  TAPE 


BBN  SECOND-LANGUAGE  PROJECT 
CHINESE  EVALUATION  TEST  DATA  SHEET 


TAPE  NO.:  20 
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JUDGE: 


DATE: 


BBN  SECOND-LANGUAGE  PROJECT 
CHINESE  EVALUATION  TEST  DATA  SHEET 

TAPE 

:  NO.:  20 

SECTION:  %  JUDGE: 
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JUDGE: 


BBN  SECOND-LANGUAGE  PROJECT 
CHINESE  EVALUATION  TEST  DATA  SHEET 

TAPE  NO.:  2.0  SECTION:  ^ |  JUDGE: 
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